Lesson 3


What to Do First?

Notes:


Pseudo-Facebook User Data

Notes:

#Setting styling of plots to the same as in the instructor videos
#theme_set(theme_minimal(24)) 
#Overriding with my own preference
#theme_set(theme_minimal(14)) 
#Remember to set correct working directory first
pf <- read.delim("pseudo_facebook.tsv")
names(pf)
##  [1] "userid"                "age"                  
##  [3] "dob_day"               "dob_year"             
##  [5] "dob_month"             "gender"               
##  [7] "tenure"                "friend_count"         
##  [9] "friendships_initiated" "likes"                
## [11] "likes_received"        "mobile_likes"         
## [13] "mobile_likes_received" "www_likes"            
## [15] "www_likes_received"
summary(pf)
##      userid             age            dob_day         dob_year   
##  Min.   :1000008   Min.   : 13.00   Min.   : 1.00   Min.   :1900  
##  1st Qu.:1298806   1st Qu.: 20.00   1st Qu.: 7.00   1st Qu.:1963  
##  Median :1596148   Median : 28.00   Median :14.00   Median :1985  
##  Mean   :1597045   Mean   : 37.28   Mean   :14.53   Mean   :1976  
##  3rd Qu.:1895744   3rd Qu.: 50.00   3rd Qu.:22.00   3rd Qu.:1993  
##  Max.   :2193542   Max.   :113.00   Max.   :31.00   Max.   :2000  
##                                                                   
##    dob_month         gender          tenure        friend_count   
##  Min.   : 1.000   female:40254   Min.   :   0.0   Min.   :   0.0  
##  1st Qu.: 3.000   male  :58574   1st Qu.: 226.0   1st Qu.:  31.0  
##  Median : 6.000   NA's  :  175   Median : 412.0   Median :  82.0  
##  Mean   : 6.283                  Mean   : 537.9   Mean   : 196.4  
##  3rd Qu.: 9.000                  3rd Qu.: 675.0   3rd Qu.: 206.0  
##  Max.   :12.000                  Max.   :3139.0   Max.   :4923.0  
##                                  NA's   :2                        
##  friendships_initiated     likes         likes_received    
##  Min.   :   0.0        Min.   :    0.0   Min.   :     0.0  
##  1st Qu.:  17.0        1st Qu.:    1.0   1st Qu.:     1.0  
##  Median :  46.0        Median :   11.0   Median :     8.0  
##  Mean   : 107.5        Mean   :  156.1   Mean   :   142.7  
##  3rd Qu.: 117.0        3rd Qu.:   81.0   3rd Qu.:    59.0  
##  Max.   :4144.0        Max.   :25111.0   Max.   :261197.0  
##                                                            
##   mobile_likes     mobile_likes_received   www_likes       
##  Min.   :    0.0   Min.   :     0.00     Min.   :    0.00  
##  1st Qu.:    0.0   1st Qu.:     0.00     1st Qu.:    0.00  
##  Median :    4.0   Median :     4.00     Median :    0.00  
##  Mean   :  106.1   Mean   :    84.12     Mean   :   49.96  
##  3rd Qu.:   46.0   3rd Qu.:    33.00     3rd Qu.:    7.00  
##  Max.   :25111.0   Max.   :138561.00     Max.   :14865.00  
##                                                            
##  www_likes_received 
##  Min.   :     0.00  
##  1st Qu.:     0.00  
##  Median :     2.00  
##  Mean   :    58.57  
##  3rd Qu.:    20.00  
##  Max.   :129953.00  
## 

Histogram of Users’ Birthdays

Notes:

#install.packages('ggplot2')
library(ggplot2)

qplot(x = dob_day, data = pf, binwidth = 1) + 
#Setting bins to be 1 for each day of the month
    scale_x_continuous(breaks=1:31)

#Also possible with ggplot()
ggplot(aes(x = dob_day), data = pf) + 
  geom_histogram(binwidth = 1) + 
  scale_x_continuous(breaks = 1:31)


What are some things that you notice about this histogram?

Response: I notice that a disproportionate amount of users have birthdays on the first day of the month. I suspect this is due to incorrect information entered by the user: the easiest way to fill out date information in a form is to leave the day at 1.

Fewer users have birthdays on day31, compared to other dates, which makes sense as only 7 out of 12 months in a year have 31 days. ***

Moira’s Investigation

Notes: There’s a mismatch between people’s perception of the audience size of their own facebook posts, and the actual audience size. ***

Estimating Your Audience Size

Notes:


Think about a time when you posted a specific message or shared a photo on Facebook. What was it?

Response: I posted a short message and shared the “for sale” ad when the neighbors were going to sell their apartment.

How many of your friends do you think saw that post?

Response: 60

Think about what percent of your friends on Facebook see any posts or comments that you make in a month. What percent do you think that is?

Response: 15%


Perceived Audience Size

Notes: Moira says that people dramatically underestimated the size of their audience. They thought it was 25% of what it actually was.


Faceting

Notes:

qplot(x = dob_day, data = pf, binwidth = 1) + 
  scale_x_continuous(breaks = 1:31) + 
  facet_wrap(~dob_month, ncol = 3)

Let’s take another look at our plot. What stands out to you here?

Response: My previous suspicion is consistent with what we see here: of the users who selected day 1 of the month almost all of them also selected month 1, indicating incorrect user input. ***

Be Skeptical - Outliers and Anomalies

Notes: Have to consider anamolies/outliers in the context of your data. ***

Moira’s Outlier

Notes: #### Which case do you think applies to Moira’s outlier? Response:


Friend Count

Notes:

What code would you enter to create a histogram of friend counts?

qplot(x = friend_count, data = pf)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#Experimenting to make it better
qplot(x = friend_count, data = subset(pf, friend_count < 1000), binwidth = 10)

How is this plot similar to Moira’s first plot?

Response: Some outliers have close to 5000 friends, which makes it hard to distinguish the finer differences among the majority of users, which have less than 1000 friends.

Long-tail data.


Limiting the Axes

Notes:

qplot(x = friend_count, data = pf, xlim = c(0, 1000))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2951 rows containing non-finite values (stat_bin).

#Same plot, different method
qplot(x = friend_count, data = pf) + 
  scale_x_continuous(limits = c(0, 1000))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2951 rows containing non-finite values (stat_bin).

Exploring with Bin Width

Notes:


Adjusting the Bin Width

Notes:

qplot(x = friend_count, data = pf, binwidth = 25) + 
  scale_x_continuous(limits = c(0, 1000), breaks = seq(0, 1000, 50))
## Warning: Removed 2951 rows containing non-finite values (stat_bin).

#Equivalent ggplot syntax: 
ggplot(aes(x = friend_count), data = pf) + 
  geom_histogram(binwidth = 25) + 
  scale_x_continuous(limits = c(0, 1000), breaks = seq(0, 1000, 50))
## Warning: Removed 2951 rows containing non-finite values (stat_bin).

Faceting Friend Count

# What code would you add to create a facet the histogram by gender?
# Add it to the code below.
qplot(x = friend_count, data = pf, binwidth = 10) +
  scale_x_continuous(limits = c(0, 1000),
                     breaks = seq(0, 1000, 50)) +
  facet_wrap(~gender, ncol = 1, strip.position = "bottom")
## Warning: Removed 2951 rows containing non-finite values (stat_bin).

In the alternate solution below, the period or dot in the formula for facet_grid() represents all of the other variables in the data set. Essentially, this notation splits up the data by gender and produces three histograms, each having their own row.

qplot(x = friend_count, data = pf) + 
  facet_grid(gender ~ .) 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#my version
qplot(x = friend_count, data = pf, binwidth = 10) +
  scale_x_continuous(limits = c(0, 1000),
                     breaks = seq(0, 1000, 50)) +
  facet_grid(gender ~ .)
## Warning: Removed 2951 rows containing non-finite values (stat_bin).

#Equivalent ggplot syntax: 
ggplot(aes(x = friend_count), data = pf) + 
  geom_histogram() + 
  scale_x_continuous(limits = c(0, 1000), breaks = seq(0, 1000, 50)) + 
  facet_wrap(~gender, ncol = 1)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2951 rows containing non-finite values (stat_bin).


Omitting NA Values

Notes:

qplot(x = friend_count, data = subset(pf, !is.na(gender)) ,binwidth = 10) +
  scale_x_continuous(limits = c(0, 1000),
                     breaks = seq(0, 1000, 50)) +
  facet_wrap(~gender, strip.position = "bottom")
## Warning: Removed 2949 rows containing non-finite values (stat_bin).

#Equivalent ggplot syntax: 
ggplot(aes(x = friend_count), data = subset(pf, !is.na(gender))) + 
  geom_histogram(binwidth = 10) + 
  scale_x_continuous(limits = c(0, 1000), breaks = seq(0, 1000, 50)) + 
  facet_wrap(~gender)
## Warning: Removed 2949 rows containing non-finite values (stat_bin).


Statistics ‘by’ Gender

Notes:

table(pf$gender)
## 
## female   male 
##  40254  58574
by(pf$friend_count, pf$gender, summary)
## pf$gender: female
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0      37      96     242     244    4923 
## -------------------------------------------------------- 
## pf$gender: male
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0      27      74     165     182    4917

Who on average has more friends: men or women?

Response: women

What’s the difference between the median friend count for women and men?

Response: 22

Why would the median be a better measure than the mean?

Response:To avoid extreme outliers having to large impact.


Tenure

Notes:

qplot(x = tenure, data = pf, binwidth = 30,
      color = I('black'), fill = I('#099DD9'))
## Warning: Removed 2 rows containing non-finite values (stat_bin).

#Equivalent ggplot syntax: 
ggplot(aes(x = tenure), data = pf) + 
   geom_histogram(binwidth = 30, color = 'black', fill = '#099DD9')
## Warning: Removed 2 rows containing non-finite values (stat_bin).


How would you create a histogram of tenure by year?

qplot(x = (tenure/365), data = pf, binwidth = .25,
      color = I('black'), fill = I('#099009') ) + 
  scale_x_continuous(breaks = seq(0, 7, 1), lim = c(0, 7) )
## Warning: Removed 26 rows containing non-finite values (stat_bin).

ggplot(aes(x = tenure/365), data = pf) + 
   geom_histogram(binwidth = .25, color = 'black', fill = '#F79420') + 
    scale_x_continuous(breaks = seq(1, 7, 1), lim = c(0, 7) )
## Warning: Removed 26 rows containing non-finite values (stat_bin).


Labeling Plots

Notes:

qplot(x = (tenure/365), data = pf, binwidth = .25,
      color = I('black'), fill = I('#099009'), 
      xlab = 'Number of years using Facebook', 
      ylab = 'Number of users in sample') + 
  scale_x_continuous(breaks = seq(0, 7, 1), lim = c(0, 7) )
## Warning: Removed 26 rows containing non-finite values (stat_bin).

#Equivalent ggplot syntax: 
ggplot(aes(x = tenure / 365), data = pf) + 
  geom_histogram(color = 'black', fill = '#F79420') + 
  scale_x_continuous(breaks = seq(1, 7, 1), limits = c(0, 7)) + 
  xlab('Number of years using Facebook') + 
  ylab('Number of users in sample')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 26 rows containing non-finite values (stat_bin).


User Ages

Notes:

qplot(x = age, data = pf, binwidth = 1) + 
  geom_histogram(color = 'black', fill = '#099009') #+ 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

  #scale_x_continuous()

ggplot(aes(x = age), data = pf) + 
  geom_histogram(color = 'black', fill = '#099009', binwidth = 1) + 
  scale_x_continuous(breaks = seq(10, 120, 2), lim = c(10, 120))

#From course: Equivalent ggplot syntax: 
ggplot(aes(x = age), data = pf) + 
  geom_histogram(binwidth = 1, fill = '#5760AB') + 
  scale_x_continuous(breaks = seq(0, 113, 5))

What do you notice?

Response: Strange outliers: way too many users are 102 and 108 years old. There’s a strange drop at 22 and 24. 21, 23 and 25 are higher. The age mode of the sample is 18, with roughly 5100 users, with 19 and 23 second with roughly 4450 users each. The age of the sample is left-skewed (above minimum age). No users are under 13, which is due to legal requirements. ***

The Spread of Memes

Notes:


Lada’s Money Bag Meme

Notes:


Transforming Data

Notes:

summary(pf$friend_count)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0    31.0    82.0   196.4   206.0  4923.0
summary(log10(pf$friend_count + 1)) #+1 to avoid infinity due to 0 friends
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.505   1.919   1.868   2.316   3.692
summary(sqrt(pf$friend_count))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   5.568   9.055  11.090  14.350  70.160

Add a Scaling Layer

Notes:

#install.packages("gridExtra")
library(gridExtra)

p1 <- qplot(x = friend_count, data = pf, binwidth=10)

p2 <- qplot(x = friend_count+1, data = pf) + 
  scale_x_log10() +
  xlab("Friend count, logarithmic scale")

p3 <- qplot(x = friend_count, data = pf) + 
  scale_x_sqrt() + 
  xlab("Friend count, squared values")
#Alternative square plot
p4 <- qplot(x = sqrt(friend_count), data = pf)
#Alternative log plot
p5 <- qplot(x = log10(friend_count+1), data = pf)

grid.arrange(p1, p2, p3, ncol = 1)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#Alternate solution in ggplot
p1 <- ggplot(aes(x=friend_count), data = pf) +
  geom_histogram()
p2 <- p1 + scale_x_log10()
p3 <- p1 + scale_x_sqrt()

grid.arrange(p1, p2, p3, ncol = 1)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Transformation introduced infinite values in continuous x-axis
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 1962 rows containing non-finite values (stat_bin).
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.


Frequency Polygons

Good for comparing 2 or more distributions at once

#without
qplot(x = friend_count, data = subset(pf, !is.na(gender)),
      binwidth = 10) +
  scale_x_continuous(lim = c(0, 1000), breaks = seq(0, 1000, 50) ) + 
  facet_wrap(~gender)
## Warning: Removed 2949 rows containing non-finite values (stat_bin).

#with
qplot(x = friend_count, data = subset(pf, !is.na(gender)),
      binwidth = 10, geom = 'freqpoly', color = gender) +
  scale_x_continuous(lim = c(0, 1000), breaks = seq(0, 1000, 50) )
## Warning: Removed 2949 rows containing non-finite values (stat_bin).
## Warning: Removed 4 rows containing missing values (geom_path).

#Using proportions instead of raw count
qplot(x = friend_count, y = ..count../sum(..count..), 
      data = subset(pf, !is.na(gender)),
      xlab = 'Friend Count',
      ylab = 'Proportions of users with that friend count',
      binwidth = 10, geom = 'freqpoly', color = gender) +
  scale_x_continuous(lim = c(0, 1000), breaks = seq(0, 1000, 50) )
## Warning: Removed 2949 rows containing non-finite values (stat_bin).

## Warning: Removed 4 rows containing missing values (geom_path).

#Equivalent ggplot syntax: 
ggplot(aes(x = friend_count, y = ..count../sum(..count..)), data = subset(pf, !is.na(gender))) + 
  geom_freqpoly(aes(color = gender), binwidth=10) + 
  scale_x_continuous(limits = c(0, 1000), breaks = seq(0, 1000, 50)) + 
  xlab('Friend Count') + 
  ylab('Percentage of users with that friend count')
## Warning: Removed 2949 rows containing non-finite values (stat_bin).

## Warning: Removed 4 rows containing missing values (geom_path).

#more accurate, as it shows porportions per color, not of total
qplot(x = friend_count, y = ..density../sum(..density..), 
      data = subset(pf, !is.na(gender)),
      xlab = 'Friend Count',
      ylab = 'Proportions of users with that friend count',
      binwidth = 10, geom = 'freqpoly', color = gender) +
  scale_x_continuous(lim = c(0, 1000), breaks = seq(0, 1000, 50) )
## Warning: Removed 2949 rows containing non-finite values (stat_bin).

## Warning: Removed 4 rows containing missing values (geom_path).

Quiz:

#Quick overlook at the data
summary(pf$www_likes)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##     0.00     0.00     0.00    49.96     7.00 14860.00
#More detailed percentile distribution
quantile(pf$www_likes, prob = seq(0, 1, length = 101), type = 5)
##       0%       1%       2%       3%       4%       5%       6%       7% 
##     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00 
##       8%       9%      10%      11%      12%      13%      14%      15% 
##     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00 
##      16%      17%      18%      19%      20%      21%      22%      23% 
##     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00 
##      24%      25%      26%      27%      28%      29%      30%      31% 
##     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00 
##      32%      33%      34%      35%      36%      37%      38%      39% 
##     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00 
##      40%      41%      42%      43%      44%      45%      46%      47% 
##     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00 
##      48%      49%      50%      51%      52%      53%      54%      55% 
##     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00 
##      56%      57%      58%      59%      60%      61%      62%      63% 
##     0.00     0.00     0.00     0.00     0.00     0.00     1.00     1.00 
##      64%      65%      66%      67%      68%      69%      70%      71% 
##     1.00     1.00     1.00     2.00     2.00     2.00     3.00     3.00 
##      72%      73%      74%      75%      76%      77%      78%      79% 
##     4.00     5.00     6.00     7.00     8.00     9.00    11.00    12.00 
##      80%      81%      82%      83%      84%      85%      86%      87% 
##    14.00    17.00    19.00    23.00    27.00    31.00    36.00    42.00 
##      88%      89%      90%      91%      92%      93%      94%      95% 
##    50.00    60.00    72.00    86.00   104.00   128.00   160.00   208.00 
##      96%      97%      98%      99%     100% 
##   276.00   378.00   568.00  1001.47 14865.00
qplot(x = www_likes, y = ..density../sum(..density..), 
      data = subset(pf, !is.na(gender)),
      xlab = 'Likes',
      ylab = 'Proportions of users with that many likes',
      binwidth = 10, geom = 'freqpoly', color = gender) +
  scale_x_continuous(lim = c(1, 208), breaks = seq(0, 208, 10) )
## Warning: Removed 65873 rows containing non-finite values (stat_bin).
## Warning: Removed 8 rows containing missing values (geom_path).

#Solution from video, equivalent ggplot syntax
ggplot(aes(x = www_likes), data = subset(pf, !is.na(gender))) + 
  geom_freqpoly(aes(color = gender)) + 
  scale_x_log10()
## Warning: Transformation introduced infinite values in continuous x-axis
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 60935 rows containing non-finite values (stat_bin).


Likes on the Web

Notes:

sum(subset(pf, gender == 'male')$www_likes)
## [1] 1430175
sum(subset(pf, gender == 'female')$www_likes)
## [1] 3507665
#Alternate solution from video
by(pf$www_likes, pf$gender, sum)
## pf$gender: female
## [1] 3507665
## -------------------------------------------------------- 
## pf$gender: male
## [1] 1430175

Box Plots

Notes:

qplot( x = gender, y = friend_count, 
       data = subset(pf, !is.na(gender) & friend_count < 1000 ), 
       geom = 'boxplot' )

#Alternate solution 1 from video
qplot( x = gender, y = friend_count, 
       data = subset(pf, !is.na(gender)), 
       geom = 'boxplot', 
       ylim = c(0,1000) )
## Warning: Removed 2949 rows containing non-finite values (stat_boxplot).

#Alternate solution 2 from video
qplot(x = gender, y = friend_count, 
      data = subset(pf, !is.na(gender)),
      geom = 'boxplot') + 
  scale_y_continuous(limits = c(0, 1000))
## Warning: Removed 2949 rows containing non-finite values (stat_boxplot).

#Alternate solution 3 from video: most accurate (does nor remove data points)
qplot(x = gender, y = friend_count, 
      data = subset(pf, !is.na(gender)),
      geom = 'boxplot') + 
coord_cartesian(ylim = c(0, 1000))

Adjust the code to focus on users who have friend counts between 0 and 1000.

#See above

Box Plots, Quartiles, and Friendships

Notes:

qplot(x = gender, y = friend_count, 
      data = subset(pf, !is.na(gender)),
      geom = 'boxplot') + 
coord_cartesian(ylim = c(0, 250))

by(pf$friend_count, pf$gender, summary)
## pf$gender: female
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0      37      96     242     244    4923 
## -------------------------------------------------------- 
## pf$gender: male
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0      27      74     165     182    4917

On average, who initiated more friendships in our sample: men or women?

Response:

by(pf$friendships_initiated, pf$gender, summary)
## pf$gender: female
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0    19.0    49.0   113.9   124.8  3654.0 
## -------------------------------------------------------- 
## pf$gender: male
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0    15.0    44.0   103.1   111.0  4144.0
by(pf$friendships_initiated, pf$gender, mean)
## pf$gender: female
## [1] 113.8991
## -------------------------------------------------------- 
## pf$gender: male
## [1] 103.0666

Write about some ways that you can verify your answer.

Response:

qplot(x = gender, y = friendships_initiated, 
      data = subset(pf, !is.na(gender)), 
      geom = 'boxplot') + 
  coord_cartesian(ylim = c(0, 150))

Response: I found out which gender on average initiate the most friendships by running the by() function for friendships_initiated and gender. I also took a look at the median and the percentiles, and for both males and females the mean is closer to the 3rd quartile than the median. This seems to be due to some very large outlier users, who have sent out a large amount of friend requests. ***

Getting Logical

Notes:

summary(pf$mobile_likes)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     0.0     4.0   106.1    46.0 25110.0
#You often want to convert features with a lot of 0 values to a binary value (True/False)
#Logical variable
summary(pf$mobile_likes > 0)
##    Mode   FALSE    TRUE    NA's 
## logical   35056   63947       0
pf$mobile_check_in <- NA
pf$mobile_check_in <- ifelse(pf$mobile_likes > 0, 1, 0)
#Making into categorical type
pf$mobile_check_in <- factor(pf$mobile_check_in)
summary(pf$mobile_check_in)
##     0     1 
## 35056 63947
#Calculation percentage of checked in users
summary(pf$mobile_check_in)[2] / nrow(pf)
##         1 
## 0.6459097
#Solution from video
sum(pf$mobile_check_in == 1) / length(pf$mobile_check_in)
## [1] 0.6459097

Response: 64.59% ***

Analyzing One Variable

Reflection: I learned more R syntax. I hadn’t really used box plots before, so that was useful. Frequency polygons were also new to me, I liked learning about that. In general I got a refresher in different ways of approcaching (mostly exploratory) data analysis. ***

Click KnitHTML to see all of your hard work and to have an html page of this lesson, your answers, and your notes!